1 ProteinLFQ

1.1 Packages

Load required packages into R session.

1.2 Functions

## SHA-1 hash of file is 46d72ef9fbf7843a06a8c125d64eaf3c02bfa2e5

1.3 Workflow

Load Protein Measurements spreadsheet exported from Progenesis.

Separte leading protein accession from group members.

  • Some protein accessions were grouped together by semicolons (;) in the data. These groups represent homologous proteins with shared peptide identifications. The first accession in each group is generally the most confident inference and is used for analysis, other accessions are considered group members.

Define column indeces for the normalized abundance values.

  • The Protein Measurements export from Progenesis contains the normalized and raw abundances for each raw file in the experiment.
##  [1] "Accession"              "Group members"         
##  [3] "Peptide count"          "Unique peptides"       
##  [5] "Confidence score"       "Anova (p)"             
##  [7] "Max fold change"        "Highest mean condition"
##  [9] "Lowest mean condition"  "Description"           
## [11] "20141222_WOS521"        "20141222_WOS526"       
## [13] "20141222_WOS5211"       "20141222_WOS5216"      
## [15] "20141222_WOS522"        "20141222_WOS527"       
## [17] "20141222_WOS5212"       "20141222_WOS5217"      
## [19] "20141222_WOS523"        "20141222_WOS528"       
## [21] "20141222_WOS5213"       "20141222_WOS5218"      
## [23] "20141222_WOS521_1"      "20141222_WOS526_1"     
## [25] "20141222_WOS5211_1"     "20141222_WOS5216_1"    
## [27] "20141222_WOS522_1"      "20141222_WOS527_1"     
## [29] "20141222_WOS5212_1"     "20141222_WOS5217_1"    
## [31] "20141222_WOS523_1"      "20141222_WOS528_1"     
## [33] "20141222_WOS5213_1"     "20141222_WOS5218_1"
##  [1] 11 12 13 14 15 16 17 18 19 20 21 22

Filter to remove proteins from the contaminant database.

Filter to remove proteins if not enough peptide evidence.

  • Proteins are identified by unique and shared PSMs, so usually the confidence in our protein identification is greatest with increasing unique peptides. It is common in proteomics to remove “one-hit-wonders” by requiring > 1 unique peptide per protein.

Select the identifier and abundance columns.

Example plot construction.

2 PeptideLFQ

Peptide-Level Quantification

  • During raw MS file processing, Progenesis subdivided each LC-MS run into peak features, which are MS1 precursor ions with a defined isotopic cluster with characteristic retention time and monoisotopic mass. The same peak feature coordinates are assigned for every LC-MS run and the summed intensity (abundance) from each is recorded.

  • MS2 spectra from data-dependent acquisition (DDA) contain the associated MS1 precursor ion mass and retention time, allowing them to be mapped to peak features.

  • The Peptide Measurements export from Progenesis contains the normalized abundances, raw abundances, and spectral counts for each identified peptide that was mapped to a MS1 peak feature in each raw file.

2.1 Packages

Load required packages into R session.

2.3 Workflow

2.3.2 Redox

Load Peptide Measurements spreadsheet exported from Progenesis.

Load Protein Measurements spreadsheet exported from Progenesis.

Load protein sequence database.

##   A AAStringSet instance of length 18944
##         width seq                                      names               
##     [1]   751 MTISTPEREAKKVKIAVDR...GGIATTWSFFLARIISVG sp|P12154|PSAA_CHLRE
##     [2]   735 MATKLFPKFSQGLAQDPTT...YIFTYAAFLIASTSGRFG sp|P09144|PSAB_CHLRE
##     [3]    81 MAHIVKIYDTCIGCTQCVR...SVRVYLGSESTRSMGLSY sp|Q00914|PSAC_CHLRE
##     [4]    43 MIFDFNYIHIFMLTITSYV...LVFTLGIYLGLLKVVKLI sp|P50369|PETL_CHLRE
##     [5]   160 MSVTKKPDLSDPVLKAKLA...LGIGSTFPIDISLTLGLF sp|P23230|PETD_CHLRE
##     ...   ... ...
## [18940]   238 MSKGEELFTGVVPILVELD...LEFVTAAGITHGMDELYK sp|GFP_AEQVI
## [18941]   204 MAEEVEEERLKYLDFVRAA...LPLLPTEKITKVFGDEAS sp|SRPP_HEVBR
## [18942]   138 MAEDEDNQQGQGEGLKYLG...SSLPGQTKILAKVFYGEN sp|REF_HEVBR
## [18943]   348 MFSSVMVALVSLAVAVSAN...VMNADNHEYFSENNPAQS sp|PLMP_GRIFR
## [18944]   271 MSHIQRETSCSRPRLNSNL...DNPDMNKLQFHLMLDEFF sp|KKA1_ECOLX

Define column indeces for the normalized abundance values.

##  [1] "#"                                    
##  [2] "Retention time (min)"                 
##  [3] "Charge"                               
##  [4] "m/z"                                  
##  [5] "Measured mass"                        
##  [6] "Mass error (u)"                       
##  [7] "Mass error (ppm)"                     
##  [8] "Score"                                
##  [9] "Sequence"                             
## [10] "Modifications"                        
## [11] "Accession"                            
## [12] "Description"                          
## [13] "Use in quantitation"                  
## [14] "Max fold change"                      
## [15] "Highest mean condition"               
## [16] "Lowest mean condition"                
## [17] "Anova"                                
## [18] "Maximum CV"                           
## [19] "20190123_EWM_AZD1_R0_1_2h_5-35_5uL"   
## [20] "20190123_EWM_AZD1_R0_2_2h_5-35_5uL"   
## [21] "20190123_EWM_AZD1_R0_3_2h_5-35_5uL"   
## [22] "20190123_EWM_AZD1_R0_4_2h_5-35_5uL"   
## [23] "20190123_EWM_AZD1_R15_1_2h_5-35_5uL"  
## [24] "20190123_EWM_AZD1_R15_2_2h_5-35_5uL"  
## [25] "20190123_EWM_AZD1_R15_3_2h_5-35_5uL"  
## [26] "20190123_EWM_AZD1_R15_4_2h_5-35_5uL"  
## [27] "20190123_EWM_AZD1_R30_1_2h_5-35_5uL"  
## [28] "20190123_EWM_AZD1_R30_2_2h_5-35_5uL"  
## [29] "20190123_EWM_AZD1_R30_3_2h_5-35_5uL"  
## [30] "20190123_EWM_AZD1_R30_4_2h_5-35_5uL"  
## [31] "20190123_EWM_AZD1_R60_1_2h_5-35_5uL"  
## [32] "20190123_EWM_AZD1_R60_2_2h_5-35_5uL"  
## [33] "20190123_EWM_AZD1_R60_3_2h_5-35_5uL"  
## [34] "20190123_EWM_AZD1_R60_4_2h_5-35_5uL"  
## [35] "20190123_EWM_AZD1_R0_1_2h_5-35_5uL_1" 
## [36] "20190123_EWM_AZD1_R0_2_2h_5-35_5uL_1" 
## [37] "20190123_EWM_AZD1_R0_3_2h_5-35_5uL_1" 
## [38] "20190123_EWM_AZD1_R0_4_2h_5-35_5uL_1" 
## [39] "20190123_EWM_AZD1_R15_1_2h_5-35_5uL_1"
## [40] "20190123_EWM_AZD1_R15_2_2h_5-35_5uL_1"
## [41] "20190123_EWM_AZD1_R15_3_2h_5-35_5uL_1"
## [42] "20190123_EWM_AZD1_R15_4_2h_5-35_5uL_1"
## [43] "20190123_EWM_AZD1_R30_1_2h_5-35_5uL_1"
## [44] "20190123_EWM_AZD1_R30_2_2h_5-35_5uL_1"
## [45] "20190123_EWM_AZD1_R30_3_2h_5-35_5uL_1"
## [46] "20190123_EWM_AZD1_R30_4_2h_5-35_5uL_1"
## [47] "20190123_EWM_AZD1_R60_1_2h_5-35_5uL_1"
## [48] "20190123_EWM_AZD1_R60_2_2h_5-35_5uL_1"
## [49] "20190123_EWM_AZD1_R60_3_2h_5-35_5uL_1"
## [50] "20190123_EWM_AZD1_R60_4_2h_5-35_5uL_1"
## [51] "20190123_EWM_AZD1_R0_1_2h_5-35_5uL_2" 
## [52] "20190123_EWM_AZD1_R0_2_2h_5-35_5uL_2" 
## [53] "20190123_EWM_AZD1_R0_3_2h_5-35_5uL_2" 
## [54] "20190123_EWM_AZD1_R0_4_2h_5-35_5uL_2" 
## [55] "20190123_EWM_AZD1_R15_1_2h_5-35_5uL_2"
## [56] "20190123_EWM_AZD1_R15_2_2h_5-35_5uL_2"
## [57] "20190123_EWM_AZD1_R15_3_2h_5-35_5uL_2"
## [58] "20190123_EWM_AZD1_R15_4_2h_5-35_5uL_2"
## [59] "20190123_EWM_AZD1_R30_1_2h_5-35_5uL_2"
## [60] "20190123_EWM_AZD1_R30_2_2h_5-35_5uL_2"
## [61] "20190123_EWM_AZD1_R30_3_2h_5-35_5uL_2"
## [62] "20190123_EWM_AZD1_R30_4_2h_5-35_5uL_2"
## [63] "20190123_EWM_AZD1_R60_1_2h_5-35_5uL_2"
## [64] "20190123_EWM_AZD1_R60_2_2h_5-35_5uL_2"
## [65] "20190123_EWM_AZD1_R60_3_2h_5-35_5uL_2"
## [66] "20190123_EWM_AZD1_R60_4_2h_5-35_5uL_2"
##  [1] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Filter to keep peptides with Percolator-adjusted Mascot scores > 13 (p-value < 0.05).

Filter to remove peptides from proteins in the contaminant database.

Join the protein identification statistics onto the Peptide Measurement data.

  • Each peptide was assigned to a single protein accession, which will have identification statistics in the Protein Measurements export file.

Summarize duplicate peak features.

  • There are usually instances of the Peptide Measurements with duplicated peak features and differing peptide identifications:

    • Some features were matched with peptides having identical sequence, modifications, and score, but alternate protein accessions. These groups were reduced to satisfy the principle of parsimony and represented by the protein accession with the highest number of unique peptides, else the protein with the largest confidence score assigned by Progenesis.

    • Some features were also duplicated with differing peptide identifications and were reduced to a single peptide with the highest Mascot ion score.

Select the identifier and abundance columns.

## Warning: slice_() is deprecated. 
## Please use slice() instead
## 
## The 'programming' vignette or the tidyeval book can help you
## to program with slice() : https://tidyeval.tidyverse.org
## This warning is displayed once per session.
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## please use list() instead
## 
## # Before:
## funs(name = f(.)
## 
## # After: 
## list(name = ~f(.))
## This warning is displayed once per session.

2.3.3 Variable Modification

2.3.4 Phosphorylation

3 StatLFQ

3.1 Packages

Load required packages into R session.

3.2 Functions

## SHA-1 hash of file is df904b8435ddee2f1af311fd8f1af701db112eda

3.3 Workflow

Define input data.

Define the column indeces for replicates in each condition.

  • Assuming input data has simplified format of an identifier column followed by abundance columns for each raw file in the experiment.
## [1] 2 3 4 5
## [1] 6 7 8 9
## [1] 10 11 12 13
## $`25`
## [1] 2 3 4 5
## 
## $`50`
## [1] 6 7 8 9
## 
## $`100`
## [1] 10 11 12 13
## $`25-50`
## $`25-50`[[1]]
## [1] 2 3 4 5
## 
## $`25-50`[[2]]
## [1] 6 7 8 9
## 
## 
## $`25-100`
## $`25-100`[[1]]
## [1] 2 3 4 5
## 
## $`25-100`[[2]]
## [1] 10 11 12 13
## 
## 
## $`50-100`
## $`50-100`[[1]]
## [1] 6 7 8 9
## 
## $`50-100`[[2]]
## [1] 10 11 12 13

Rename the abundance columns in a simplified “Condition-Replicate” format.

##  [1] "Accession" "25-1"      "25-2"      "25-3"      "25-4"     
##  [6] "50-1"      "50-2"      "50-3"      "50-4"      "100-1"    
## [11] "100-2"     "100-3"     "100-4"

Filter to keep identifiers with >50% of replicates having nonzero abundances in any condition.

Perform a log2-transformation of abundances as a variance-stabilization. The base R function log2() returns “-Inf” for zero values. These instances are changed to “NA” following the transformation.

Impute missing values (“NA”) using a conditional strategy with the imp4p package. Iterate by condition and check if each idenfifier has reliable quantitation.

  • If at least one replicate has a nonzero abundance, impute other replicates with values drawn from a normal distribution centered on the mean of nonzero replicates.

  • If all replicates have nonzero abundance, impute with small values drawn from a normal distribution centered on the lower 25th percentile of abundances.

3.3.1 Pairwise t-test

Perform a t-test on each identifier for defined condition pairs.

  • By default, a two-sided, equal-variance t-test is run with Benjamini-Hochberg FDR correction.

3.3.2 One-way ANOVA

Perform a one-way analysis of variance (ANOVA) on each identifier across conditions.

  • By default, a one-way ANOVA is run for all unique conditions with Benjamini-Hochberg FDR correction.

3.3.3 Fold change

Calculate fold change using the mean replicate abundance for each condition.

  • Subtracting log2-transformed values is equivalent to dividing the non-transformed values and then transforming: \(log2(B) - log2(A) = log2(B/A)\)

5 AnnotateLFQ

5.1 Packages

Load required packages into R session.

5.2 Functions

## SHA-1 hash of file is d2edc6c602c09ddab41aa269eaa56ea3e588a79f

5.3 UniProtKB API

Given a list of protein accessions, access UniProtKB and pull known information.

6 Output

6.1 Saving Data from R

The write_csv() function from the readr package in tidyverse is one option to save a dataframe copy in the working directory.